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Abstract. We propose a new algorithm for recommender systems with numeric 
ratings which is based on Pattern Structures (RAPS). As the input the algorithm 
takes rating matrix, e.g., such that it contains movies rated by users. For a target 
user, the algorithm returns a rated list of items (movies) based on its previous rat¬ 
ings and ratings of other users. We compare the results of the proposed algorithm 
in terms of precision and recall measures with Slope One, one of the state-of-the- 
art item-based algorithms, on Movie Lens dataset and RAPS demonstrates the 
best or comparable quality. 
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1 Introduction and related work 

Formal Concept Analysis (FCA)JT] is a powerful algebraic framework for knowledge 
representation and processing Il2l3i . However, in its original formulation it deals with 
mainly Boolean data. Even though original numeric data can be represented by so called 
multi-valued context, it requires concept scaling to be transformed to a plain context 
(i.e. a binary object-attribute table). There are several extensions of FCA to numeric 
setting like Fuzzy Formal Concept Analysis 13151 . In this paper, to recommend partic¬ 
ular user items of interest we use Pattern Structures, an extension of FCA to deal with 
data that have ordered descriptions. In fact, we use interval pattern structures that were 
proposed in JS] and successfully applied, e.g., in gene expression data analysis (7). 

The task of recommending items to users according to their preferences expressed 
by ratings of previously used items became extremely popular during the last decade 
partially because of famous NetFlix 1M$ competition (§j. Numerous algorithms were 
proposed to this end. In this paper we will mainly study item-based approaches. Our 
main goal is to see whether FCA-based approaches are directly applicable to the set¬ 
ting of recommender systems with numeric data. Previous approaches used concept 
lattices for navigation through the recommender space and allowed to recommend rel¬ 
evant items faster than online computation in user-based approach, however it requires 
expensive offline computations and a substantial storage space (9). Another approach 
tries to effectively use Boolean factorisation based on formal concepts and follows user- 
based k-nearest neighbours strategy ED. A parameter-free approach that exploits a 
neighbourhood of the object concept for a particular user also proved its effectiveness 


ffTTII but it has a predecessor based on object-attribute biclusters IfTZl that also capture 
the neighbourhood of every user and item pair in an input formal context. However, it 
seems that within FCA framework item-based techniques for data with ratings have not 
been proposed so far. So, the paper bridges the gap. 

The paper is organised as follows. In Section^ basic FCA definitions and interval 
pattern structures are introduced. Section [3] describes SlopeOne Ifl3l and RAPS with 
examples. In Section [4] we provide the results of experiments with time performance 
and precision-recall evaluation for MovieLens dataset. Section[5]concludes the paper. 

2 Basic definitions 

Formal Concept Analysis. First, we recall several basic notions of Formal Concept 
Analysis (FCA) fT|. Let G and M be sets, called the set of objects and attributes, respec¬ 
tively, and let I be a relation I C G x M: for g £ G, m £ M, glm holds iff the object g has 
the attribute m. The triple IK = (G,M,7) is called a (formal) context. If A C G, B C M 
are arbitrary subsets, then the Galois connection is given by the following derivation 
operators: 


A' = {m £ M | glm for all g £ A}, 

B' = {g £ G | glm for all m £ B}. 

The pair (A,71), where A C G, B C M, A' = B, and B' = A is called a (formal) 
concept (of the context K) with extent A and intent B (in this case we have also A" = A 
and B" = B). 

The concepts, ordered by {A\.B\) > (A2,7?2) -<=>■ A] D A 2 form a complete lattice, 
called the concept lattice 'B(G.M.I). 

Pattern Structures. Let G be a set of objects and D be a set of all possible object descrip¬ 
tions. Let n be a similarity operator. It helps to work with objects that have non-binary 
attributes like in traditional FCA setting, but those that have complex descriptions like 
intervals or graphs. Then (I). FI) is a meet-semi-lattice of object descriptions. Mapping 
5 : G —>■ D assigns an object g the description d £ ( D , n). 

A triple (G, (D. FI), 5) is a pattern structure. Two operators (-) D define Galois con¬ 
nection between (2 G ,C) and (An): 


A^f^forACG (2) 

geA 

d a = {g £ G\d C 5(g)} for d £ (An), where (3) 

d E 5(g) -£==> 5 n 5(g) = d. 


For a set of objects A operator U returns the common description (pattern) of all 
objects from A. For a description d operator[3]returns the set of all objects that contain 
d. 


A pair (A,d) such that A C G and d £ (D. n) is called a pattern concept of the 
pattern structure (G, (D, n), 8) iff A 1 = d and t / 1 = A. In this case A is called a pattern 
extent and d is called a pattern intent of a pattern concept (A,d). Pattern concepts are 
partially ordered by (Ai,c/ 1 ) < ( A 2 ,d 2 ) •£=>■ A \ C A 2 ( <=>■ c /2 E d\). The set of all 
pattern concepts forms a complete lattice called a pattern concept lattice. 

Intervals as patterns. It is obvious that similarity operator on intervals should fulfill 
the following condition: two intervals should belong to an interval that contains them. 
Let this new interval be minimal one that contains two original intervals. Let [a\,b\] 
and [<22, £>2] be two intervals such that a\,b\ ,a2,/?2 G K, fli < b\ and «2 < /?2- then their 
similarity is dehned as follows: 


[a\,b \\n [ 02 ,^ 2 ] = [min(ai,a 2 ),max(fei,fc 2 )]. 


Therefore 


[ai,bi]n[a 2 ,b 2 ] -*=>■ [aiM ] n [02,^2] = [a\,b\] 

[min(a 1 ,a 2 ),max( 6 i,fe 2 )] = [a u bi\ 
a\ < ai and b\ > £>2 •<=>■ [a\,b{\ 2 [ a 2-bi\ 


Note that a £ R can be represented by [a,a]. 

Interval vectors as patterns. Let us call p-adic vectors of intervals as interval vectors. 
In this case for two interval vectors of the same dimension e = ([a,-, i'z])ie[i.p] an d / = 
([ci,d/]),- € [i jP ] we define similarity operation via the intersection of the corresponding 
components of interval vectors, i.e.: 


enf=([a i ,b i \} ie[hp] n([c i ,d i ]) ie[hp] <s=^ eUf = {[a u bi] □ [ciA])ie[i,p] 

Note that interval vectors are also partially ordered: 

e E/ <*=> ([«<, b i]}ie[i,p\ E ([G,d;]);e[i,p] WiM E [ci,di\ 

for all i £ [1 ,p\. 

3 Recommender Algorithms 

3.1 Slope One 

Slope One is one of the common approaches to recommedations based on collaborative 
filtering. However, it demonstrates comparable quality with more complex and resource 
demanding algorithms ED . As it was shown in m, SlopeOne has the highest recall on 
MovieLens and Netflix datasets and acceptable level of precision: “Overall, the algo¬ 
rithms that present the best results with these metrics are SVD techniques, tendencies- 
based and slope one (although its precision is not outstanding).” 


We use this algorithm for comparison purposes. 

Slope One deals with rating matrices as input data. In what follows the data contains 
movies ratings by different users. That is M = {mi,m 2 , ■ ■ ■ ,m n } is a set of movies, 

U = {ii\.U 2 , ■ ■ ■ , Uk } is a set of users. The rating matrix can be represented by many¬ 
valued formal context (U where R = {1,2,3,4,5, *} is a set of possible ratings 

and a triple ( u,m,r ) £ I means that the user u marked by the rating r the movie m. 
Whenever it is suitable we also use r tJ notation for rating of movie mj by user m,. 

In case a user u has not rated a movie m, we use m(u) = r — *, i.e. missing rating. 

Let us describe the algorithm step by step. 

1. The algorithm takes a many valued context of all users’ ratings, the target user u t for 
which it generates recommendations. It also requires left_border and right _border 
for acceptable level of ratings, i.e. if one wants to receive all movies with ratings 
between 4 and 5, then left and right borders should be 4 and 5 respectively. The last 
pair of parameters: one needs to set up minimal and maximal scores (minjborder 
and max_border) that are acceptable for our data. It means that if the algorithm 
predicts rating 6.54 as a score and maximal score is bounded by 5, then 6.54 should 
be treated as 5. 

2. The algorithm finds the set of all movies evaluated by the target user S{u t ). 

3. For every non-evaluated movie mj £ M\S(u t ) by u t execute step 4), and by so 
doing calculate the predicted rating for the movie mj. After that go to step 5). 

4. For every evaluated movie nij £ S(u t ) by u t calculate Sjj{U \ {«,}), the set of users 
that watched and evaluated movies w, and mj. In case Sjj(U \ {u t }) is non-empty, 

that is \Sj t i(U \ {Mf})| > 0, calculate the deviation: devjj = £ |s ■'■tcr\{«>})| 

u k£Sj,i(U\{ u t}) J ’ 

and add i to Rj. 

After all current deviations found, calculate the predicted rating: P(u,)j = ]_ ( dev 

[ ieRj 

r L j), where Rj = { i\nij £ S(u t ),i ^ j, \Sjj{U \ {u t })\ > 0}. In case Rj is empty, the 
algorithm cannot make a prediction. 

5. By this step Slope One found all predicted ratings P(u t ) for movies from M \ S{u, ). 
The algorithm recommends all movies with predicted ratings in the preferred range 
left_border < P(u, ) j < right_border, taking into account minimal and maximal 
allowed values. 

If one needs top-A ranked items, she can sort the predicted scores from the resulting 
set in decreasing order and select first N corresponding movies. 

Example 1. 

Consider execution of Slope One on the dataset from Table Q] 


Table 1 . Example of data for Slope One 


user\movie 

m\ 

m2 

m3 

U\ 

5 

3 

2 

U2 

3 

4 

* 

Z/3 

* 

2 

5 








Let us try to predict the rating for M 3 and movie m\. 


1. Let left_border = 4, right Jborder = 5, min_border = 1, and max_border = 5. 

2. We find S(ut,) = {m 2 ,m 3 }, the set of evaluated movies by the target user. 

3. M\5 (m3 ) = {mi} 

4- Si2(U\{u3}) = {lii,u 2 } 

de ^,2 = (fl ’‘XSiP’ 2) = ((5 - 3) + (3 -4))/2 = 0.5 
Si 3 {U\{ut,}) = {mi} 

dev 1,3 = (n .1 —n, 3 )/(|{Mi}|) = (5 — 2)/l = 3 
^1 = {2,3} 

P(m 3 )i = \/\R j\(devi t 2 + r 3j2 + devig, + r 3 . 3 ) = l/2(0.5+2 + 3+5) = 5.25 
5. Taking into account the maximal rating boundary, the algorithm predicts 5 for 
movie mi, and therefore recommends user M 3 to watch it. 

3.2 RAPS 

Our approach, RAPS (Recommender Algorithm based on Pattern Structures), works 
with the same many valued context as Slope One. 

Let us describe the algorithm. 

1. It takes the context ( U,M,R,I ) with all ratings, and a target user u, . It also re¬ 
quires left_border and rightJrorder for preferred ratings, i.e. if one wants to get 
all movies rated in range from 4 to 5, then left_border = 4 and right_border = 5. 

2. Define the set of movies M t = {m, t ,...,m, q } that the target user it, liked, i.e. the 
ones that she evaluated in the range [leftJborder,right_border\. 

3. For each movie m tj £ M t apply eq.[3] and find the set of users that liked the movie 
At, = [left Jborder, right Jiorder]^ for 1 < i < q. As a result one has the set of user 
subsets: {A h ,... ,A tq }. 

4. For each A tj , 1 < i < q apply eq.[2]to find its description; in our case it is a vector 
of intervals d tj = A\j = {[a\,b '[],..., [an,bn]) for 1 < i < q. Note that, in case a 
particular user u x from A tj has not rated m y , i.e. r xy = *, then the algorithm does 
not take it into account. 

5. At the last step compute the vector r = (R 1 ,... ,R n ) £ N" (or R" in general case), 
where 


Rj = \{ f11 < i < q, [a'i,b'j] C [lef t_border,right J?order}}\ , i.e. 

for each movie ntj the algorithm counts how many of its descriptions [aj, b'j} are in 
[left Jborder,rightJjorder], If R, > 0, then the algorithm recommends watching 
the movie. 

Top-/V movies with the highest ratings can be selected in similar way. 

Let us shortly discuss the time computational complexity. Step 2 requires 0(\M\) 
operations, steps 3, 4 and 5 perform within <9(|M||(/|) each. Therefore, the algorithm 
time complexity is bounded by <9(|M||(/|). 

Example 2 

Consider execution of RAPS on the tiny dataset from Table [2] 

Let us find a recommendation for user M7. 



Table 2. Example of data for RAPS 


user\movie 

m\ 

m2 

m 3 

m4 


m 6 

U\ 

5 

3 

1 

3 

5 

3 

m 

4 

4 

1 

5 

4 

3 

U?, 

5 

* 

* 

3 

* 

4 

W 4 

* 

3 

4 

* 

2 

4 

U5 

4 

* 

4 

5 

4 

* 

u 6 

3 

4 

5 

5 

* 

3 

U'l 

5 

4 

2 

* 

* 

* 


1. The input of the algorithm: t = 7, leftjborder = 4 and right Jborder = 5. 

2. M-j = {tni,m 2 } 

3. Aj = [4,5]° = {u u u 2 ,u 3 ,u 5 } 
a 2 = [4,5]° ={m 2 ,m 6 } 

4. 

t/i = A° = <[a},fe}],[fl^,^],[a^,^],[4,^],[a^^],[a^^]> = 

= ([4,5], [3,4], [1,4], [3,5], [4,5], [3,4]} 

For example, interval \a\ y h^\ is found as follows: 

[ a h b l\ = [min(ri 6,7"2 ] 6,^3,6, r 5,6),max(r 1 6,r2 j 6,t' 3) 6, r 5,6)] = 

= [min(3,3,4,*),max(3,3,4,*)] = [min(3,3,4),max(3,3,4)] = [3,4], 

The rest intervals are found in similar way. 

=A° = <[3,4], [4,4], [1,5], [5,5], [4,4], [3,3]) 

5. Taking into account the left and right bounds, the algorithm recommends movies 
m\ and m$ from d\ and m 2 ,1114 and from d 2 . Therefore R = (1,1,0,1,2,0), i.e. 
without already assessed movies by M 7 , we recommend her to watch in .4 and m 3 . 


4 Experimental evaluation 

4.1 Data 

For our experimentation we have used freely available data from MovieLens websittfl. 
The data collection was gathered within The GroupLens Research Project of Minnesota 
University in 1997-1998. The data contains 100000 ratings for 1682 movies by 943 
different users. Each user rated no less than 20 movies. That is we have 100000 tuples 
in the form: 

user id I item id I rating I timestamp. 

Each tuple shows user id, movie id, the rating she gave to the movie and time when 
it happened. 

1 http://grouplens.org/datasets/movielens/ 











4.2 Quality assessment 


Firstly, for quality assessment of Slope One and RAPS we used precision and recall 
measures. Note that we cannot use Mean Absolute Error (MAE) directly, since RAPS 
actually assume a whole interval like [4,5] for a particular movie, not a number. We 
select 20% of users to form our test set and for each test user we split her rated movies 
into two parts: the visible set and the hidden set. The first set consists of 80% rated 
movies, and the second one contains the remaining 20%. Moreover, to make the com¬ 
parison more realistic, movies from the first set were evaluated earlier than those from 
the second one. It means that first we sort all ratings of a given user by timestamp and 
then perform splitting. 

There is a more general testing scheme based on bimodal cross-validation from Q5), 
which seems to us the most natural and realistic: users from the test set keep only x% of 
their rated movies, and the remaining y% of their ratings are hidden. Thus, by consid¬ 
ering each test user in this way, we model a real user whose ratings to other movies are 
not yet clear, but at the same time we have all ratings’ information about the training set 
of users. In other words, we hide only rectangle of size x% of test users by y% of hidden 
items. One can vary x and y during the investigation of the behaviour of methods under 
comparison, where the size of top-N recommended list is set to be equal to y%. The 
part of hidden items can be selected randomly or by timestamp (preferably for realistic 
scenario). Note that there is no a gold standard approach to test recommender systems, 
however, there are validated sophisticated schemes ng. The main reason is the fol¬ 
lowing: with only off-line data in hands we cannot verify whether the user will like a 
not yet seen movie irrespective of assumption that she has seen our recommendation. 
However, for real systems there is a remedy such as A/B testing, which is applicable 
only in online setting liTTl . 

The adjusted precision and recall are defined below: 


precision = 
recall = 


| {relevant movies} fl {retrieved movies} l~l {test movies} \ 
]{retrieved movies} fl {test movies } | 

|{relevant movies} l~l {retrieved movies} l~l {test movies}\ 
\{relevant movies} (T {test movies}\ 


(4) 

(5) 


These measures allow us to avoid the uncertainty since we do not know how actually 
a particular user would assess a recommended movie. However, in real recommender 
system, we would rather ask a user whether the recommendation was relevant, but in 
our off-line quality assessment scheme we cannot do that. In other words, we assume 
that for a test user at the moment of assessment there are no movies except the training 
and test ones. 

Another issue, which is often omitted in papers on recommender systems, is how 
to avoid uncertainty when denominators in Precision and Recall are equal to zero (not 
necessarily simultaneously). 

To define the mesaures precisely based on the peculiarities of recommendation task 
and common sense, we use two types of the definitions for cases when the sets of 
retrieved and relevant movies for particular user and recommender are empty. 

Precision and Recall of the first type are defined as follows: 




- If the sets of relevant movies and retrieved ones are empty, then Precision = 0 and 
Recall = 1. 

- If the set of relevant movies is empty, but the set of retrieved ones is not, then 
Precision = 0 and Recall = 1. 

- If we have the non-empty set of relevant movies, but the set of retrieved movies is 
empty, then Precision = 0 and Recall = 0. 

Precision of the second type is less tough, but Recall remains the same: 

- If the sets of relevant movies and retrieved ones are empty, then Precision = 1 (since 
we should not recommend any movie and the recommender has not recommended 
anything). 

- If the set of relevant movies is empty, but the set of retrieved ones is not, then 
Precision = 0. 

- If we have the non-empty set of relevant movies, but the set of retrieved movies is 
empty, then Precision = 1 (since the recommender has not recommended anything, 
its output does not contain any non-relevant movie). 

4.3 Results 

We have performed three series of tests: 

1. A movie is worth to watch if its predicted mark is 5 (i.e. it is [5,5]). 

2. A mark is good if it is from [4,5], 

3. Any mark from [3,5] is good. 

All the tests are performed in OS X 10.9.3 with Intel Core i7 2.3 GHz and 8 Gb 
of memory. The algorithm were implemented in MATLAB - R2013a. The results are 
presented in Tabled Note that the reported Precision and Recall are of the first type. 


Table 3. RAPS vs Slope One Results 


Algorithm 

Preference 

Average 

Average 

Average 

FI-measure 

name 

Interval 

time, s 

precision 

recall 


RAPS 

[5.5] 

3.62 

19.42 

50.52 

28.06 

Slope One 

[5,5] 

18.90 

1.57 

23.41 

2.94 

RAPS 

[4,5] 

18.23 

55.61 

63.33 

59.22 

Slope One 

[4,5] 

18.90 

53.99 

30.39 

38.89 

RAPS 

[3,5] 

32.98 

80.11 

83.65 

81.84 

Slope One 

[3,5] 

18.90 

83.81 

81.88 

82.83 


The criteria are average execution time in seconds, average precision and recall. 
From the table one can see RAPS is drastically better than Slope One by the whole set of 
criteria in [5,5]. For [4,5] interval both approaches have comparable time and precision, 
but Slope One has two times lower recall. For [3,5] interval the algorithms demonstrate 
similar values of precision and recall but RAPS 1.5 times slower on average. 











However, since the compared approaches are different from the output point view 
(RAPS provides the user with an interval of possible ratings but SlopeOne does it by 
a single real number), we perform thorough comparison varying the lower bound of 
acceptable recommendations and using both types of the adjusted precision and recall 
measures. 

From Fig. Q]one can conclude that RAPS dominates SlopeOne in most cases by 
Recall. As for Precision, even though for [5,5] interval RAPS is significantly better, 
after lower bound of 4.4 SlopeOne shows comparable but slightly better Precision in 
most cases. 
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Fig. 1. Precision and Recall of the first type for RAPS and SlopeOne for the varying lower bound 


From Fig. [2] one can see that SlopeOne is significantly better in terms of preci¬ 
sion. Only on the interval [3,5] the difference between SlopeOne and RAPS is negli¬ 
gible (the lower bound value equals 3). The reasonable explanations is as follows: for 
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Fig. 2. Precision of the second type for RAPS and SlopeOne for the varying lower bound 


SlopeOne there are more cases when {retrieved movies} = 0 irrespective of the size 
of {relevant movies}. Remember that in such cases Precision of the second type is 
equal to 1. In other words SlopeOne is really more precise (or even concise): in such 
cases it just does not recommend anything. However, it can be hardly judged in movie 
recommendation domain that a recommender is good when it does not recommend. 

We can conclude that the proposed recommender technique based on pattern struc¬ 
tures has its right to be used. Since the Slope One algorithm was exploited in real 
recommender systems na, we can suggest our technique for usage as well. 

5 Conclusion and further work 

In this paper we proposed the technique for movie recommendation based on Pattern 
Structures (RAPS). Even though this algorithm is oriented to movie recommendations, 
it can be easily used in other recommender domains where users evaluate items. 

The performed experiments (RAPS vs Slope One) showed that recommender sys¬ 
tem based on Pattern Structures demonstrates acceptable precision, better recall in most 
cases and reasonable execution time. 

Of course, in future RAPS should be compared with other recommender techniques 
to make a final conclusion about its applicability. An interplay between interval-based 
recommendations and regression-like ones deserves a more detailed treatment as well. 
The further work can be continued in the following directions: 

1. Further modification and adjustment of RAPS. 

2. Development of the second variant of Pattern Structures based recommender. There 
is a conjecture that for the second derivation operation (operator Galois from eq0 
being applied to more than one movie with high marks we may obtain relevant 
predictions as well. 










3. Comparison with existing popular techniques, e.g. S VD and S VD++. 
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